Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Correct query times for model plot and forecast #327

Merged
merged 3 commits into from
Dec 5, 2018

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Dec 4, 2018

We were querying for the model bounds and forecast points at the beginning of each bucket. Instead we should match the time offset we apply to bucket samples when we update the model.

The upshot was that model bounds and forecasts were (typically) offset in time with respect to the data values. The problem is particularly noticeable for long bucket lengths. For example, the figures below show the model bounds for 1 day buckets before and after the fix.

Before:
screenshot 2018-12-04 at 13 04 31

After:
screenshot 2018-12-04 at 13 04 55

core_t::TTime bucketLength{model.s_ForecastModel->params().bucketLength()};
core_t::TTime startTime{model_t::sampleTime(
feature, forecastJob.s_StartTime, bucketLength)};
core_t::TTime endTime{model_t::sampleTime(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to fix it in CAnomalyJob::doForecast instead? CForecastRunner is just a dumb worker and should not have any important logic. CAnomalyJob::doForecast calls into the runner and sets startTime to m_LastResultsTime, it seems to me, that adjusting it there does the same thing but is a bit cleaner. endTime is anyway just relative to startTime.

Maybe the same can be done for model plots.

Copy link
Contributor Author

@tveasey tveasey Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is this is feature specific. So it is tricky to push it higher up if the forecast is being run over a job with multiple detectors with different features.

I could create a wrapper which implements the logic in the model library. I can't directly push the feature into the forecast function (because it is in the maths library which can't depend on EFeature). I could supply a call back to compute the offset start and end times and have this use the wrapper from the model library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, how about I add a function to actually run the forecast to model_t which wraps up this detail. Given we only have the maths::CTimeSeriesModel here (for good reason) this seems like it might be the cleanest option.

Copy link

@hendrikmuhs hendrikmuhs Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is this is feature specific. So it is tricky to push it higher up if the forecast is being run over a job with multiple detectors with different features.

ok, I see and agree that's to complicated.

What about inside of model.s_ForecastModel->forecast(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hits the library dependency issue mentioned above. However, what about if I have a
CForecastDataSink::SForecastModelWrapper::forecast function which takes the forecast job. This could wrap all the functionality now in this loop?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I am also ok if we keep the current version given that alternatives are to complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the idea of wrapping this in SForecastModelWrapper. It seems more natural to me than in this loop which is really just about scheduling. I'll make it and see how it looks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f980f26. Note that none of the members of SForecastModelWrapper are needed outside of the new forecast function, so I converted to a class.

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195
Copy link
Contributor

I removed the v6.5.3 label from this PR as this was backed out of 6.5 in #330. We need to put more thought into the impact of changing results document timestamps.

@tveasey
Copy link
Contributor Author

tveasey commented Dec 7, 2018

We discussed this some more. There were some misunderstandings about the nature of the change, but also there was a change to the default offsets in time buckets at which forecast points were requested. I reverted to the old style of defining the forecast points at "bucket time ", i.e. offset zero, in #332. We will target this and #332 together at 6.5.4.

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Dec 12, 2018
tveasey added a commit that referenced this pull request Dec 13, 2018
@tveasey tveasey deleted the bug/model-plot-forecast-time-offset branch May 1, 2019 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants